Seeing Stars from Reviews by a Semantic-based Approach with MapReduce Implementation
نویسندگان
چکیده
This study concerns the problem of aspect-level opinion (sentiment) mining from online reviews. The problem consists of two fundamental sub-tasks: aspect extraction (identify specific aspects of the product from reviews), and aspect rating estimation (offer a numerical rating for each aspect). Solving this problem is important and useful for many applications, e.g., providing aspect-level review summaries to consumers for better decision making, and for product manufacturers to collect summarized user feedback. Our objective is to propose a semantic-based approach for aspect level opinion mining from massive amounts of reviews in a scalable fashion. The MapReduce implementation for this approach obtains much runtime reduction compared with the single-process implementation. Experimental results show that the runtime reductions by the MapReduce implementation are almost linear to the number of mappers, e.g., around 7.4 times reduction with 10 mappers on the TripAdvisor dataset and 2.6 times reduction with 4 mappers on the Yelp dataset. The number of mappers and reducers can be configured on demand to handle very large datasets in a scalable fashion. Moreover, the semantic-based approach obtains good performance for aspect rating estimation on the TripAdvisor dataset, with the MAE score of around 1.0 on all aspects, which means that the average deviation between the human rating and the estimated rating is around 1 star. The source code of our implementation for the sentiment-based approach can be downloaded from https://github.com/ppfliu/aspect-opinion.
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملAn Effective and Efficient MapReduce Algorithm for Computing BFS-Based Traversals of Large-Scale RDF Graphs
Nowadays, a leading instance of big data is represented by Web data that lead to the definition of so-called big Web data. Indeed, extending beyond to a large number of critical applications (e.g., Web advertisement), these data expose several characteristics that clearly adhere to the well-known 3V properties (i.e., volume, velocity, variety). Resource Description Framework (RDF) is a signific...
متن کاملScalable Distributed Reasoning Using MapReduce
We address the problem of scalable distributed reasoning, proposing a technique for materialising the closure of an RDF graph based on MapReduce. We have implemented our approach on top of Hadoop and deployed it on a compute cluster of up to 64 commodity machines. We show that a naive implementation on top of MapReduce is straightforward but performs badly and we present several non-trivial opt...
متن کاملTowards an Ontology-Based Semantic Approach to Tuning Parameters to Improve Hadoop Application Performance
Hadoop MapReduce assists companies and researchers to deal with processing large volumes of data. Hadoop has a lot of configuration parameters that must be tuned in order to obtain a better application performance. However, the best tuning of the parameters is not easily obtained by inexperienced users. Therefore, it is necessary to create environments that promote and motivate information shar...
متن کاملA Fast Algorithm for Covering Rectangular Orthogonal Polygons with a Minimum Number of r-Stars
Introduction This paper presents an algorithm for covering orthogonal polygons with minimal number of guards. This idea examines the minimum number of guards for orthogonal simple polygons (without holes) for all scenarios and can also find a rectangular area for each guards. We consider the problem of covering orthogonal polygons with a minimum number of r-stars. In each orthogonal polygon P,...
متن کامل